:
(1) Department of Ecology and Evolutionary Biology, University of California – Los Angeles
(2) Department of Microbiology and Molecular Genetics, University of California – Los Angeles (3) National Park Service
(+) Corresponding Author
- Email: gkandlikar@ucla.edu
- Phone: (+1) 952-288-7351
- Mailing Address: Dept. of Ecology & Evolutionary Biology, 621 Charles E. Young Drive S., Los Angeles, CA 90095
: environmental DNA; data visualization; citizen science; community science; shiny; metabarcoding; education; community ecology
Environmental DNA (eDNA) metabarcoding is becoming a core tool in biodiversity monitoring and conservation, and is a promising way to go beyond species inventory to systems-level analyses of community ecological dynamics. Results from eDNA analyses can inform and inspire research scientists, natural resource managers, students, community scientists, and naturalists; however, there is a dearth of easily accessible data exploration tools for this diverse audience. Here we present the R package ranacapa, at the core of which is a Shiny web-app that helps perform exploratory biodiversity analyses and visualizations of eDNA results. The web-app accepts multiple formats of taxonomy tables, and requires a simple metadata file with descriptive information about each sample. The app allows users to explore the data with interactive figures for instant community ecology analysis. We demonstrate the usability of ranacapa by multiple user groups, including the National Park Service, a public community science program, and an undergraduate microbiology course.
The targeted amplification and sequencing of DNA that living organisms shed into the physical environment they occupy (termed “environmental DNA metabarcoding”, or “eDNA sequencing”) is revolutionizing microbiology, ecology, and conservation research. (Taberlet et al. 2012, Deiner et al. 2017). Sequencing of DNA extracted from field-collected soil, water, or sediment samples holds great promise to shed light on a range of questions, ranging from tracking the dynamics of bacterial communities and profiling the composition of ancient plant and animal communities (Props et al. 2017, Pedersen et al. 2014), to motoring populations of rare or endangered species (Balasingham et al. 2017). As the cost of eDNA sequencing declines and sample collection techniques become more streamlined (Thomas et al. 2018), professional research scientists are also beginning to use eDNA sequencing as a platform to partner with members of the community, such as natural resource managers, undergraduate students, and citizen scientists (collectively referred to in this manuscript as “community scientists”), in primary research. However, developing robust and impactful community science partnerships that engage the community in all steps of the research process remains a challenge.
eDNA sequencing-based projects work well for community science partnerships because non-experts can be quickly trained to collect samples in the field and because eDNA sequencing is an exciting framework for descriptive and hypothesis-driven research pertinent to disciplines such as medicine, agriculture, ecology, and geography (Deiner et al. 2017). The community partners in such programs can have heterogeneous backgrounds, ranging from curious members of the public for whom collecting samples in the field is the first scientific research experience (e.g. University of California’s CALeDNA program, http://www.ucedna.com/), to professional natural resource managers who regularly collaborate with research scientists (e.g. Center for Ocean Solutions’ eDNA project, https://oceansolutions.stanford.edu/project-environmental-dna). In these partnerships (as in any other), community participants should be able to participate across multiple stages of the project, not only in sample collection (Pandya 2012, https://ecsa.citizen-science.net/sites/default/files/ecsa_ten_principles_of_citizen_science.pdf). This can be a challenge for eDNA sequencing-based community science programs because although it is relatively easy to train community partners to collect eDNA samples, it is far more challenging to train them to interact with and visualize the results from these studies.
Engaging community partners in data exploration and analysis phases of eDNA sequencing-based research projects is challenging because these studies generate datasets that are large, multidimensional, and stored in idiosyncratic formats (e.g. BIOM tables). Indeed, learning the bioinformatic tools necessary for managing and analyzing these data is a major hurdle even for professional researchers (Mulder et al. 2018). To address this challenge, we created an R package “ranacapa”, at the core of which is a Shiny webapp that allows users to make exploratory visualizations and perform simple community ecology analyses with results from eDNA sequencing studies. Include a sentence that repeats that this is a first step. ranacapa complements existing visualization platforms (e.g. Phinch, Phyloseq-Shiny, QIIME2 Viewer), as in addition to interactive visualizations, ranacapa includes brief explanations and links to additional educational resources to provide users with an overview of basic data analyses used in eDNA studies. ranacapa works with community matrices generated via QIIME () and stored as BIOM tables or with community matrices generated with the Anacapa sequence analysis pipeline (https://github.com/limey-bean/Anacapa), which is used extensively by the CALeDNA program.
In the remainder of this manuscript, we describe ranacapa and demonstrate its use by two community science partnerships based at the University of California, Los Angeles (UCLA): first, a collaboration between eDNA researchers and resource managers at the National Park Service, and second, a partnership between community ecology researchers and an undergraduate microbiology course at UCLA. As we show in the Use Cases, empowering community partners to interact with the data and perform simple but insightful community ecology analyses can help make these collaborations more enriching and valuable to both parties.
ranacapa consists of a Shiny webapp (Chang et al. 2017) and two categories of helper functions (Table 1). The first set of functions works to connect the taxonomy tables, generated either by the Anacapa eDNA sequence analysis pipeline (https://github.com/limey-bean/Anacapa; Curd et al. in prep) or QIIME (Caporaso 2010), into phyloseq objects that can be used for downstream visualizations and analyses. The second set of functions, which includes two externally written functions openly available on GitHub, extends the visualization and statistical functionality of the phyloseq (McMurdie and Holmes 2013) and vegan (Oksanen et al. 2018) packages.
The Shiny app (http://gauravsk.shinyapps.io/ranacapa or rancapa::runRanacapaApp) allows users to interact with eDNA results through statistical summaries and interactive plots, displayed in the following tabs:
Figure 1: Taxon accumulation curve as shown in the first tab of ranacapa
Figure 2: Taxonomy heatmap as shown in the first tab of ranacapa
ranacapa depends on Bioconductor v 3.7 and R v 3.5.1. The Shiny app has been tested on Chrome and Firefox on Windows, Mac-OSX, and Ubuntu. The package can be installed using the command “devtools::install_github(“gauravsk/ranacapa”)”, and the Shiny app is available at http://gauravsk.shinyapps.io/ranacapa. ranacapa focuses on visualizing and analyzing the taxonomy tables generated by two metabarcode sequence analysis pipelines: Anacapa (https://github.com/limey-bean/Anacapa) and QIIME (http://qiime2.org/). Anacapa generates tab-delimited taxonomy tables, with each row representing the taxonomic identification for each Amplicon Sequence Variant (ASV) and each column representing a sequenced sample. QIIME allows users to export ASV tables in the standard BIOM table format and taxonomy files as .tsv format files. Either the Anacapa .txt or the QIIME .biom files can be uploaded as the taxonomy files for ranacapa. ranacapa also expects sample metadata to be uploaded as a tab-delimited .txt file. The ranacapa function validate_input_files() verifies that both the taxonomy table and the metadata files match certain structural requirements, which are documented in the function help files. The current version of ranacapa accepts both categorical and continuous metadata columns, but in the latter case, continuous values are categorized into bins.
We designed ranacapa to be used by eDNA researchers to share the results from their research with community partners. Specifically, we expect that researchers with bioinformatic expertise will use best-practices to assign taxonomy to eDNA datasets using the pipeline of their choice and generate clean taxonomy and metadata files. Researchers will then use ranacapa to share results with their community partners, emphasizing the analyses or visualizations most appropriate to their use case. We document two such partnerships below that showcase how ranacapa can facilitate authentic communication between researchers and community scientists.
eDNA research scientists can use ranacapa to share results, especially interactive taxonomy lists, with natural resource managers. For example ranacapa was used by researchers at UCLA who partner with resource managers at the Channel Islands National Park to assess the potential for eDNA as a biodiversity monitoring tool in the Southern California Channel Islands. The goal of this ongoing collaboration is to assess whether eDNA metabarcoding studies can provide insights to supplement ongoing management efforts at the park, which are currently done with expensive and time-intensive visual surveys (Lessios 1996, Murphy and Jenkins 2010, Usseglio 2015). Implementing streamlined eDNA-based monitoring may allow a dramatic expansion in the scope and scale of marine ecosystem assessment in the California (Edgar et al. 2007, Deiner et al. 2017).
To begin exploring whether eDNA-based studies can supplement visual underwater surveys, resource managers at the Channel Islands National Park Service collected and filtered thirty-1L water samples for eDNA analysis at permanent monitoring sites inside and adjacent to MPAs in the park. Research scientists at UCLA performed metabarcode sequencing of the mitochondrial 12S and CO1 gene regions from these samples targeting bony fishes, elasmobranches, and invertebrate taxa. The researchers processed sequences and assigned taxonomy using the Anacapa toolkit. When taxonomy tables were ready, researchers used the ranacapa Shiny app to share results from this pilot study with National Park resource managers.
The taxonomy heatmap (Figure 3) was the most valuable vizualation to this collaboration, because it allowed the resource managers to focus on a particular set of key taxa. The heatmap showed that this pilot study detected 36 of the 70 key metazoan taxa monitored by the managers at the species level, and the remaining 34 at the genus, family, or order level. This indicates that eDNA-based studies can likely supplement ongoing management efforts and provide new insights into the spatial and temporal distributions of these key species, especially rare and difficult to observe taxa such as endangered or invasive species. The resource managers were also interested in the PCoA plot, which was used to explore whether well-known major biogeographic patterns in the Channel Islands (e.g. turnover of fish communities across gradients in sea surface temperature, ) are detected using eDNA analyses. The value of ranacapa in this scenario was to highlight the strengths and areas for concern in using eDNA to monitor diversity in the Channel Islands. Due to the potential for eDNA to help improve detection of rare species (especially endangered species or newly introduced exotics), which are difficult to observe visually, this collaboration is continuing. The data from this study are packaged as the demo dataset for the ranacapa Shiny web-app and are available online at XXX.
Students can use ranacapa to interact with results from metabarcoding studies and to learn the basic structure of eDNA datasets. A research-based environmental microbiology course at UCLA (Sanders et al. 2016) used eDNA metabarcoding approaches to study the impact of a recent local wildfire on the plant and soil microbial community. The goal of this twenty-week course was to provide students an authentic experience in basic microbiology and microbial community ecology research. The instructors helped students develop a research question, design a sampling regime to test their hypotheses, and conduct fieldwork to collect soil samples for eDNA analyses in burned and unburned natural areas. Over the first ten weeks of the course, the instructional team (which included eDNA research scientists) extracted total DNA and sequenced the ITS2 (Gu et al. 2013) and 16S SSU RNA (Caporaso et al. 2012) metabarcoding region to characterize the plant, bacterial, and archaeal communities in the student-collected soil samples. The researchers then processed the sequences and assigned taxonomy using the Anacapa toolkit.
Shortly after taxonomy tables were generated, the course instructors introduced students to eDNA data exploration and simple statistical analyses using ranacapa. A key strength of using ranacapa was that despite having no prior bioinformatics experience, students began exploring on the an online instance of Shiny app (http://gauravsk.shinyapps.io/ranacapa) within a single class period. Thus, using ranacapa allowed the instructors to focus their time with the students on biological questions rather than on troubleshooting bioinformatics problems, as had been the case in previous sessions of the course. The course instructors noted that this basic exploration in ranacapa, which was not part of the curriculum in previous iterations of the course, had several positive impacts on students and their research projects. First, ranacapa helped students explore the basic structure of the dataset and begin to understand the relationships between community profile and the various metadata they had collected in the field. Second, ranacapa opened the door to basic diversity analyses– for example, students could easily test their hypotheses regarding the taxonomic diversity of microbes in burned and unburned soils. Third, by significantly reducing the time and difficulty in visualizing soil microbial diversity patterns, ranacapa helped students develop and pursue more sophisticated analyses during the remaineder of the course using tools such as STAMP. The taxonomy tables and metadata files used in this course are available online at XXXX.
Metabarcode sequencing of environmental DNA is becoming a key tool in a wide variety of ecological studies, and results from these studies are of interest to a broad audience. Our R package and Shiny web-app ranacapa helps users conduct exploratory analyses and visualizations on eDNA datasets, and is a step toward making the results from large eDNA studies more accessible and understandable for a wide range of community research partners. ranacapa is designed to be used as a first step in visualizing results from such studies, and we encourage researchers to perform additional analyses outside of the Shiny app according to the specific project requirements.
We propose three avenues for future work in ranacapa. First, we plan to use ranacapa as the primary tool to present eDNA results from hundreds of samples sequenced by the CALeDNA community science program. The positive experience with reserve managers suggests open forums to discuss ranacapa output will be fruitful to strengthen the feedback loop between community partners and researchers. Second, ranacapa will be a key tool in the upcoming undergraduate curriculum module “Pipeline for Undergraduate Microbiome Analysis”, which is being built as a complete suite of analysis and data visualization tools which will be made openly available to undergraduate researchers. Finally, in the long-term, we believe that there is great promise in connecting ranacapa to Taxa [] and ultimately to packages that connect with APIs of online biodiversity databases (e.g. Taxize, rinat). This will help users explore a much wider range of biodiversity questions, for example, by programmatically asking whether their samples include invasive species that are absent from other nearby sites. Such apps that allow non-technical audiences to easily interact with results from eDNA sequencing studies have great potential to engage community partners with a wide range of backgrounds and interests in primary research.
No competing interests were disclosed
GSK and ZJG were supported by the US-NSF Graduate Research Fellowship (DEG No. 1650604) during the development of this package. EEC is supported by the CALeDNA program, with funds from University of California President’s Research Catalyst Award (CA-16-376437).
We thank Sabrina Shirazi, Rachel Turba, Chris Dao, and Keith Mitchell for providing feedback on developmental versions of this package. Ranacapa builds on numerous functions that have been made openly available online with a GPL-3 License, namely the “phyloseq-extended” toolkit written by Mahendra Mariadassau (https://github.com/mahendra-mariadassou/phyloseq-extended) and “pairwise.adonis” written by Pedro Martinez Arbizu (https://github.com/pmartinezarbizu/pairwiseAdonis).